Driving Consensus Through Repos

R/Pharma Summit @ Posit Conf 2024, August 11th

Slides available: pharmar.github.io/events-positconf2024


On behalf of the R Validation Hub team:

Aaron Clark Arcus Biosciences

Doug Kelkhoff Roche

👋 Who We Are

The R Validation Hub is a collaboration to support the adoption of R within a biopharmaceutical regulatory setting (pharmaR.org)

  • Grew out of R/Pharma 2018
  • Led by participants from ~10 organizations
  • With frequent involvement from health authorities (primarily the FDA)
  • And subscribers from ~60 organizations spanning multiple industries

🤝 Affiliates:

Works with and provides support to the R Foundation and to the key organizations developing, maintaining, distributing and using R software

Key Pharma Activities

  • The R Validation Hub
  • R Submission Working Group
  • R/Medicine
  • R-Hub
  • R Repositories Working Group (ie CRAN enhancements, future development)

Impact

Community Grants & Sponsorships

Over USD $1.4 Million

Organizing Large Scale Collaborative Projects

R Validation Hub, R-Ladies

Co-Host Multidisciplinary Data Science Forums

Stanford Data Institute

Direct Support for Key R Events

R/Medicine, R/Pharma, useR!, LatinR, and more

Direct Worldwide Support for R User Groups

Join the R Consortium

r-consortium.org

  • Help guide the future direction of the R language
  • Collaborate on cross industry initiatives
  • Raise your leadership profile in the R Community
  • Protect your investment in R while supporting the common good

👷‍♂️ The R Validation Hub: What We Do

Products

White Paper

Guidance on compliant use of R and management of packages

Repositories

Building a public, validation-ready resource for R packages

Coline Zeballos

Communications

Connecting validation experts across the industry

Jaxon Abercrombie, Anuja Das, Antal Martinecz

{riskmetric}

Gather and report on risk heuristics to support validation decision-making

Eric Milliman

{riskassessment}

A web interface to {riskmetric}, supporting review, annotation and cataloging of decisions

Aaron Clark, Jeff Thompson

{riskscore}

An R data package capturing risk metrics across all of CRAN

Aaron Clark

📊 A Quick Survey

Keep your hand raised if…

  • It’s early morning and you need an excuse to stretch
  • This is your first time hearing about the R Validation Hub
  • Your org contributes to the R Validation Hub
  • Your org leverages the R Validation Hub guidelines
  • Your org uses R Validation Hub tools ({riskmetric}, {riskassessment})

🗓️ Agenda

  • Communications Workstream 5min
  • {riskassessment} App Workstream 5min
  • {riskmetric} workstream 10min
    Watch for big changes coming
  • Repositories Workstream 25min
  • Room Discussion 10 - 15min
  • Closing

📜 Workstream Updates

🗝 Key Policy Updates!

  • R Submissions Working group Pioneering use of Containers for delivering study analysis with positive health authority feedback.1

Reminder: News within the last year

  • The FDA appears to accept .R files through their eSUB portal2.
  • The FDA has released a draft of a new Computer Software Assurance3 guideline that seems to be increasingly the basis for their evaluation of R.

📣 Communications Workstream

Community Meetings 🗓 TODO add cal icon

In the last year…

  • May 21, 2024 - Tackling Hurdles: Embracing Open-Source Packages in Projects (GitHub/slides
  • February 03, 2024 - Unraveling the Term “Validation” (GitHub/slides/notes)
  • November 28, 2023 - Wrapping Up 2023 and Welcoming 2024 (slides/recording)
  • August 09, 2023 - Risk Metric Application and Risk Score – A 2-part Mini Series (GitHub)
  • June 27, 2023 - Learnings & Reflections from Case Studies (GitHub/slides)

How do I sign up?

Our Next Community Meeting

  • Next date: 🗓️ Aug 20, 2024

  • Speaker: 👷‍♂️ Bríd Roberts, Novartis [TODO: how do I change this icon to female?]

  • Topic:

Analyzing change in assessed risk across package releases

The Software Open Source (SOS) team manages and executes the risk assessment process for R package validation at Novartis. The team uses an internally developed R package to classify the risk of each package as “low”, “medium”, or “high”.

We analysed the risk assessment data over two time points to determine the impact on the assigned risk categorisation for packages with AND without version changes.

In this talk, we showcase the risk assessments over time, the causes of any risk class changes, and their impact on various teams within our organizations as a result.

Website Refresh

Then: Outdated, clumsy, hard to navigate… Insert Image TODO

Now: Update, streamlined, user friendly! Insert Image TODO

{riskassessment} App

{riskassessment} App

Latest Features Recap

  • Expanded decision automation to include individual {riskmetric} assessment values

  • New Function Explorer page and faster exploration of source code

  • Expanded the package dependency view

  • Miscellaneous items

    • About tab
    • non-shinymanager deployment

    The feedback loop is crucial! All of these improvements started off as community-driven suggestions on our GitHub repo. If you have an idea that doesn’t already exist on the existing list of issues, submit a new issue today and it may become a reality tomorrow.

    {riskassessment} App

Test drive now

TODO: add link and icon for small car

https://app.pharmar.org/riskassessment/

Looking for Volunteer “Leads” to represent

{riskassessment} App

Decision Automation Rules, by metric assessment

{riskassessment} App

New Function Explorer! (Code provided by GSK)

{riskassessment} App

Package Dependency Integration

{riskmetric}

{riskmetric} Roadmap

Big changes

Running from risk scores & focusing solely on metric assessments

  • Ease of use:
    Wrapper functions for a a complete workflow, prettier outputs
  • Metric completeness:
    Implement metrics for as many pacakge sources as possible. Chain sources together to create more complete assessments
  • Modular additions:
    Allow users to easy add custom assessments, create optional assessments based on community packages (e.g. oyster, srr, pkgstats, etc)
  • Focusing on metrics and scoring:
    Making custom weighting more robust and convenient. Guidance materials on weighting specific assessments based on community feedback and our own views on best practices.

📦 Repositories

Repositories Workstream

Supporting a transparent, open, dynamic, cross-industry approach of establishing and maintaining a repository of R packages.

  • A CRAN-like repository
  • Providing package qualities for risk-based decision-making
  • Evaluated against representative systems
  • “Bring-your-own” quality cut-offs
  • Declarative quality decision-making

The Pulse of the Industry

  • Our whitepaper is widely adopted
  • But implementing it is inconsistent & laborious
    • Variations throughout industry pose uncertainty
    • Sharing software with health authorities is a challenge
    • Health authorities, overwhelmed by technical inconsistencies, are more likely to question software use
  • We feel the most productive path forward is a shared ecosystem
  • Public discussion on how to characterize quality code/methods

Goals

Generating Quality Indicators

  • Provide a community-maintained catalog of package quality indicators (“risk metrics”)
  • Calculated against cohort of packages
  • Known system
  • Consistently evaluated, with transparent methods

Consolidate Decision-Making

  • Serve subsets of packages that conform to a specified risk tolerance
  • Transparently demonstrate selection criteria
  • Allows for one-off-analysis from public repo
  • .. or mirroring of filtered snapshot

An evolving R ecosystem

In close communication with many beloved R projects

Submissions Working Group

Repositories Working Group

pharmaverse

targetting repos integration

r-lib/pak

targetting pak integration

Pilot Implementation

focus on proving capabilities, quick development

Package: praise
Version: 1.2.3
DownloadURL: 
  github.com/cran/praise.tar.gz
code_coverage: 0.75

Package: survfit
Version 2.3.4
DownloadURL:
  github.com/cran/survfit.tar.gz
code_coverage: 0.87

Package repository, built on CRAN mirror & GitHub actions
r-hub/repos
Pre-calculated {riskmetric} scores
{riskscore}
PACKAGES
Manually Join Data
library(pharmapkgs)
options(available_packages_filter = 
  risk_filter(code_coverage > 0.8))
available.packages()
pak::pkg_install("survfit")

all modelled after r-hub/repos

Interacting with the repo

Packages risk filters

  • Helper package for system administrators
  • Restricts packages available for installation to those fitting a policy
  • Uses packages metadata in the repo
  • May be used together with manual checks (e.g., read a statistical review)

CRAN
CRAN
risk_filter()
risk_filter()
20K+ pkgs
20K+ pkgs
study-ready pkgs
study-ready pkgs
Text is not SVG - cannot display

As a user*

repo <- "https://raw.githubusercontent.com/pharmaR/repos/main/ubuntu-22.04/4.5"
options(repos = c("pharmaR/repos/ubuntu" = repo))
available.packages()
options(
  available_packages_filters = risk_filter(
    # package is exceptionally testing 
    (quality_code_coverage >= 0.8 & 
      quality_example_coverage >= 0.8 &
      quality_r_cmd_check_errors == 0) |

    # or is exceptionally well adopted
    (percentile(quality_downloads_1yr) > 90 |
      quality_reverse_dependencies_count >= 10) |

    # or seems to follow thorough development practices
    (quality_has_website &
      quality_vignette_count >= 1 &
      quality_author_count >= 3)
  )
)

*aspirational deviations from proof of concept in github.com/pharmaR/pharmapkgs

Repository ‘back-end’

Infrastructure setup

  • Hosts risk assessment metadata
  • Links to artifacts of the R-hub check system (via DownloadURL)
  • Integrates with pak::pkg_install
  • Supports multiple levels of risk tolerance

DCF file forked from r-hub/repos

Package: bslib
Version: 0.6.1
Depends: R (>= 2.10), R (>= 4.4), R (< 4.4.99)
License: MIT + file LICENSE
DownloadURL:
         https://github.com/cran/bslib/releases/download/0.6.1/bslib_0.6.1_b4_R4.4_x86_64-pc-linux-gnu-ubuntu-22.04.tar.gz
Built: R 4.4.0; ; 2023-11-29 16:39:06 UTC; unix
RVersion: 4.4
Platform: x86_64-pc-linux-gnu-ubuntu-22.04
Imports: base64enc, cachem, grDevices, htmltools (>= 0.5.7), jquerylib (>= 0.1.3),
         jsonlite, lifecycle, memoise (>= 2.0.1), mime, rlang, sass (>= 0.4.0)
...

Added fields for risk-based assessment

riskmetric_run_date: 2023-06-21
riskmetric_version: 0.2.1
covr_coverage: 0.852
has_vignettes: 1
remote_checks: 0.846
...

Packages cohort validation workflow

Risk assessment pipeline

Calculates package QA metadata on updated packages and their reverse dependencies

Produces logs and other reproducibility data

In the future: can run on in-house infrastructure

Packages cohort validation workflow

D pkg_1 pkg_1 Version: 1.15 covr_coverage: 0.967 has_vignettes: 1 pkg_2 pkg_2 Version: 3.5 covr_coverage: 0.984 has_vignettes: 1 pkg_2->pkg_1 pkg_3 pkg_3 Version: 1.9 covr_coverage: 0.992 has_vignettes: 1 pkg_3->pkg_1 pkg_3->pkg_2 pkg_4 pkg_4 Version: 0.5 covr_coverage: 0.864 has_vignettes: 0 pkg_5 pkg_5 Version: 4.2 covr_coverage: 0.924 has_vignettes: 1 pkg_5->pkg_4

Packages cohort validation workflow

D pkg_1 pkg_1 Version: 1.15 covr_coverage: ...       has_vignettes: ...       pkg_2 pkg_2 Version: 3.6 covr_coverage: ...       has_vignettes: ...       pkg_2->pkg_1 pkg_3 pkg_3 Version: 1.9 covr_coverage: 0.992 has_vignettes: 1 pkg_3->pkg_1 pkg_3->pkg_2 pkg_4 pkg_4 Version: 0.5 covr_coverage: 0.864 has_vignettes: 0 pkg_5 pkg_5 Version: 4.2 covr_coverage: 0.924 has_vignettes: 1 pkg_5->pkg_4

Packages cohort validation workflow

D pkg_1 pkg_1 Version: 1.15 covr_coverage: 0.967 has_vignettes: 1 pkg_2 pkg_2 Version: 3.6 covr_coverage: 0.987 has_vignettes: 1 pkg_2->pkg_1 pkg_3 pkg_3 Version: 1.9 covr_coverage: 0.992 has_vignettes: 1 pkg_3->pkg_1 pkg_3->pkg_2 pkg_4 pkg_4 Version: 0.5 covr_coverage: 0.864 has_vignettes: 0 pkg_5 pkg_5 Version: 4.2 covr_coverage: 0.924 has_vignettes: 1 pkg_5->pkg_4

Our roadmap

What’s next

Automating up-to-date quality metrics to support sponsor risk assessment

Package: praise
Version: 1.2.3
DownloadURL: 
  github.com/cran/praise.tar.gz
code_coverage: 0.75

Package: survfit
Version 2.3.4
DownloadURL:
  github.com/cran/survfit.tar.gz
code_coverage: 0.87

Package repository, built on CRAN mirror & GitHub actions
r-hub/repos
Periodically re-calculate metrics for updated packages
pharmaR/repos
PACKAGES
library(pharmapkgs)
options(available_packages_filter = 
  risk_filter(code_coverage > 0.8))
available.packages()
pak::pkg_install("survfit")
risk_report("praise")
Reference Image

PDF

Reference container image(s)

Should mimic environments of companies and health authority reviewers

To be used by the Regulatory R Repository for packages cohort validation

Main intent: start a cross-company dialogue on infrastructure

Closing

Thank you

To our Core Team members

  • Coline Zeballos, Roche
  • Doug Kelkhoff, Roche
  • Jaime Pires, Roche
  • Yann Féat, mainanalytics
  • Andrew Borgman, Biogen
  • Astrid Radermacher, Jumping Rivers
  • Colin Gillespie, Jumping Rivers
  • Magnus Mengelbier, Limelogic
  • Nicoles Jones, Denali Therapeutics
  • Ramiro Magno, Pattern Institute
  • Stefan Doering, Boehringer-Ingelheim
  • Kevin Kunzmann, Boehringer-Ingelheim
  • Matthias Trampisch, Boehringer-Ingelheim
  • Wilmar Igl, Icon Plc
  • Lluís Revilla, IrsiCaixa AIDS Research Institute
  • Yoni Sidi, Pinpoint Strategies
  • Zhenglei Gao, Bayer